Max Pellert (https://mpellert.at)
Deep Learning for the Social Sciences
Professor for Social and Behavioural Data Science (interim, W2) at the University of Konstanz
Assistant Professor (Business School of the University of Mannheim)
I worked in industry at SONY Computer Science Laboratories in Rome, Italy
PhD from the Complexity Science Hub Vienna and the Medical University of Vienna in Computational Social Science
Studies in Psychology and History and Philosophy of Science
Msc in Cognitive Science and Bsc in Economics (both University of Vienna)
Computational Social Science
Digital traces
Affective expression in text
Natural Language Processing
Collective emotions
Belief updating
Psychometrics of AI
PostDoc at the University of Konstanz
Junior Research Fellow of the Complexity Science Hub Vienna
Consultant of the International Labour Office
Phd in Physics at Sapienza University, Enrico Fermi Research Center and Sapienza School for Advanced Studies in Rome
MSc in Theoretical Physics
Research Interests
Complex Digital Systems
Social Networks
Recommendation Algorithms
Large Language Models
AI
| Date | Topic | Who? |
|---|---|---|
| 9.4. | Logistics & Motivation | Max |
| 16.4. | Supervised Learning | Max |
| 23.4. | Shallow Neural Nets | Max |
| 30.4. | Perceptron and Multi Layer Perceptrons | Giordano |
| 7.5. | Convolutional Neural Networks | Giordano |
| 14.5. | Graph Neural Networks | Giordano |
| 21.5. | NN for Time Series analysis | Giordano |
| Date | Topic | Who? |
|---|---|---|
| 28.5. | No class | |
| 4.6. | Generative Deep Learning 1 | Giordano |
| 11.6. | NLP 1 | Max |
| 18.6. | NLP 2 | Max |
| 25.6. | Reinforcement Learning | Giordano |
| 2.7. | Large Language Models | Max |
| 9.7. | Generative Deep Learning 2 | Giordano |
| 16.7. | Outlook | Max |
Lectures on Tuesday 10.00 - 11.30 in C421
Exercises (practical sessions) are provided over the semester on Wednesday 13:30 - 15.00 in D430
Your tutor will be Andri Rutschmann, he will co-teach tutorials with us
Assignments will be released on Tuesday evening or the latest Wednesday before the tutorial
The deadline is before a tutorial a few weeks later
Assignment submissions through Github as in ICSS
4 assignments count 40% of the final grade of the course (10% each)
Counts 60% of the grade
More information about the project will be delivered soon…
Prince, S. J. D. (2023). Understanding deep learning. The MIT Press.
Available in print or for free as a PDF: https://udlbook.github.io/udlbook/
On the webpage you will find many additional materials
We will cover many topics from the book and it also helps you as additional materials to deepen your knowledge on specific aspects in-depth
In addition to the contents covered in the book, in this course we aim to keep the focus on applications in the social sciences
Basic workflow: Define a mapping from input to output
Learn this mapping from paired input/output data examples
Often, the examples are from data sets of inputs that have been manually annotated by humans, i.e. the output are human-labeled supervisory signals
Often, the annotation is done by crowdworkers (if the task is not already outsourced to another model)
Univariate regression problem (one output, real value)
Fully connected network
Multivariate regression problem (>1 output, real value)
Graph neural network
Binary classification problem (two discrete classes)
Transformer network
Multiclass classification problem (discrete classes, >2 possible values)
Recurrent neural network (RNN)
Multiclass classification problem (discrete classes, >2 possible classes)
Convolutional network
An equation relating input (age) to output (height)
Search through family of possible equations to find one that fits training data well
Deep neural networks are just a very flexible family of equations
Fitting deep neural networks = “Deep Learning”
Multivariate binary classification problem (many outputs, two discrete classes)
Convolutional encoder-decoder network
Multivariate regression problem (many outputs, continuous)
Convolutional encoder-decoder network
Regression = continuous numbers as output
Classification = discrete classes as output
Two class (binary) and multiclass classification treated differently
Multilabel = zero or more of x discrete classes
Univariate = one output
Multivariate = more than one output
Very complex relationship between input and output
Sometimes we may have many possible valid answers (think of translation for example)
But outputs (and sometimes inputs) obey rules
Can we learn the “grammar” of the data from unlabeled examples?
Can use an enormous amount of data to do this (as we don’t need costly labels)
This has potential to make the supervised learning task earlier by having a lot of general knowledge of possible outputs (about grammatically correct sentences for example)
Learning about a dataset without labels
For example:
Clustering
Finding outliers
Generating new examples
Filling in missing data
In this course, we focus primarily on supervised approaches, but boundaries are sometimes a bit fuzzy as “self-supervision” shows
We can also create large amounts of “free” labeled data ourselves with two main approaches:
Generative self-supervised learning masks part of each data example and the task is to predict the masked part (this way we get a “label”)
For example, take a corpus of unlabeled images, remove a part of each image and try to fill in (“inpaint”) the missing part
Or we might take a large corpus of text (from the internet) and mask some words that we then try to predict
Or we might take cut off texts and try to predict the word that follows after the cut-off
Contrastive self-supervised learning uses pairs of examples that have a relationship and compares them to unrelated pairs.
With images, we could set up the task to decide if pairs of images are transformed versions of one another or if they are unconnected
Or, with text, we can determine if two sentences follow each other in the original document or not
We can also establish if two sentences are logically related
–> A lot of potential for creative approaches using and transforming found data
1958 Perceptron (Simple “neural” model)
1986 Backpropagation (Practical deep neural networks)
1989 Convolutional networks (Supervised learning)
2012 AlexNet Image classification (Supervised learning)
2014 Generative adversarial networks (Unsupervised learning)
2014 Deep Q-Learning - Atari games (Reinforcement learning)
2016 AlphaGo (Reinforcement learning)
2017 Machine translation (Supervised learning)
2019 Language models ((Un)supervised learning)
2022 Dall-E2 Image synthesis from text prompts ((Un)supervised learning)
2022 ChatGPT ((Un)supervised learning)
2023 GPT4 Multimodal model ((Un)supervised learning)
The Hugging Face Model Hub can give you an idea about the vast number of possible application areas
Also check out the Data Set Hub and Hugging Face Spaces
Spaces are often used for demos and to showcase interesting models and their applications
You can also rent dedicated hardware (billed by the minute, usually very cheap) to run spaces privately without queues
Endless research opportunities using “Text as Data”
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press.
Text data can come from social media for example and be analysed for sentiment, emotions, arguments, stance, …
“We propose and explore the possibility that language models can be studied as effective proxies for specific human subpopulations in social science research.”
“Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the “algorithmic bias” within one such tool—the GPT-3 language model—is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property algorithmic fidelity and explore its extent in GPT-3.”
“We create ‘silicon samples’ by conditioning the model on thousands of sociodemographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and sociocultural context that characterize human attitudes.”
Synthetic data: in-silico replication of experiments
In the vision domain: image classification, for example of satellite images to count attendance at events (cars) or migration flows
Whisper: Analysis of transcripts of videos (for example from Youtube) with NLP models
In the (near) future, more tools for analysing videos directly?
Promising advances in video generation for example
“[…] using two-parameter logistic regression (that is, one neuron) and obtain the same performance as that of the 13,451-parameter DNN.”
“We further show that a logistic regression based on the measured distance and mainshock average slip (instead of derived stresses) performs better than the DNN.”
“Before commenting on the interesting philosophical issues raised by Mignan and Broccardo, I note that the authors were able to reproduce the results presented in our paper (available at https://github.com/phoebemrdevries/Learning-aftershock-location-patterns).”
“The perspective presented in our paper is that it was interesting to discover that a neural network learned a simple, non-exotic combination of stresses that provided considerably improved precision.”
The AI industry, as many other parts of the economy, depends heavily on the attention of all kinds of stakeholders to attract funding
Fanning the flames with musings about the close possibility of Artifical General Intelligence (AGI) is part of that game
We also see special emphasis on extreme dangers that are only very very remotely likely (if at all possible)
This can also, maybe counterintuitively, be seen as beneficial
Polemically: “Fund us because only we can protect you”
Inflated expecations can also backfire, see “AI Winter”
On the other hand, it’s impossible to deny progress over the last years in areas such as NLP and also for other modalities than text such as image or video (analysis and generation)
Standardized benchmark test are indicators for the speed of progress (but still imperfect measures)
There is also a rewarding niche for experts that are by default downtalking every and each achievement
Often, these kind of experts have little to contribute besides that general criticism
There is a lot of questionable information around on “AI”